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Abstract. We study repeated games where players employ an exponential 
learning scheme in order to adapt to an ever-changing environment. If the 
game's payoffs are subject to random perturbations, this scheme leads to a 
new stochastic version of the replicator dynamics that is quite different from 
the "aggregate shocks" approach of evolutionary game theory. Irrespective 
of the perturbations' magnitude, we find that strategies which are dominated 
(even iteratively) eventually become extinct and that the game's strict Nash 
equilibria are stochastically asymptotically stable. We complement our analy- 
sis by illustrating these results in the case of congestion games. 



1. Introduction 

Ever since it was introduced in [1], the notion of a Nash equihbrium and its 
refinements remain among the most prominent solution concepts of non-cooperative 
game theory. The reason for this is pretty simple and lies at the heart of any 
competitive scenario: the fear of losing is a strong deterrent for any rational player 
who might consider defecting unilaterally from such an equilibrium. 

Still, this discouragement does little to settle the issue of why and how players 
might have arrived to equilibrial strategies in the first place. After all, the complex- 
ity of most games increases exponentially with the number of players and, hence, 
identifying a game's equilibria quickly becomes prohibitively difficult. Accordingly, 
as was first pointed out by Aumann [2], a player has no incentive to play his com- 
ponent of a Nash equilibrium unless he is convinced that all other players will play 
theirs. And if the game in question has multiple Nash equilibria, this argument 
gains additional momentum: in that case, even players with unbounded deductive 
capabilities will be hard-pressed to choose a strategy. 

From this point of view, rational individuals would appear to be more in tune 
with Aumann's notion of a correlated equilibrium [2] where subjective beliefs are 
also taken into account. Nevertheless, the seminal work of Maynard Smith on 
animal confiicts [3] has cast Nash equilibria in a different light because it unearthed 
a profound connection between evolution and rationality: roughly speaking, one 
leads to the other. So, when different species contend for the limited resources 
of their habitat, evolution and natural selection steer the ensuing conflict to an 
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equilibrial state which leaves no room for irrational behaviour. As a consequence, 
instinctive "fight or flight" responses that are deeply ingrained in a species can be 
seen as a form of rational behaviour, acquired over the species' evolutionary course. 

Of course, this evolutionary approach concerns large populations of different 
species which are rarely encountered outside the realm of population biology. How- 
ever, the situation is not much different in the case of a finite number of players who 
try to learn the game by playing again and again and who strive to do better with 
the help of some learning algorithm. Therein, evolution docs not occur as part of 
a birth/death process; rather, it is a byproduct of the players' acquired experience 
in playing the game - see [4] for a most comprehensive account. 

In both approaches, a fundamental selection mechanism is that of the replicator 
dynamics [5, 6] which reinforces a strategy proportionately to the difference of its 
payoff from the mean (taken over the species or the player's strategies, depending 
on the approach). As was shown by Samuelson and Zhang in the multi-population 
setting of [7] (which is closer to learning than the self-interacting single-population 
scenaria of [5,6]), these dynamics arc particularly conducive to rationality. Strate- 
gies that are suboptimal when paired against any choice of one's adversaries rapidly 
become extinct and, in the long run, only rationally admissible strategics can sur- 
vive. Even more to the point, the only attracting states of the dynamics turn out to 
be precisely the (strict) Nash equilibria of the game - see [8] for a masterful survey. 

We thus see that Nash equilibria arise over time as natural attractors for rational 
individuals. This fact further justifies their prominence among non-cooperative so- 
lution concepts but it is also conditional on the underlying game remaining station- 
ary throughout the time horizon that it takes players to adapt to it. Unfortunately, 
in many practical applications this stationarity assumption cannot be met: in bi- 
ological models, the reproductive fitness of an individual may be affected by the 
ever-changing weather conditions; in networks, communication channels carry time- 
dependent noise and interference as well as signals; and when players try to sample 
their strategies, they might have to deal with erroneous or imprecise readings. 

It is thus logical to ask: does rational behaviour still emerge in the presence of 
stochastic perturbations that interfere with the underlying game? 

In evolutionary games, these perturbations traditionally take the form of "aggre- 
gate shocks" that are applied directly to the population of each phenotype. This 
approach of Fudenberg and Harris [9] has spurred quite a bit of interest and there 
is a number of features that differentiate it from the deterministic one. In [10] 
Cabrales showed that dominated strategies indeed become extinct, but only if the 
variance of the shocks is low enough. More recently, the work of Imhof and Hof- 
bauer [11,12] revealed that even equilibrial play arises given enough time but again, 
conditionally on the variance of the shocks. 

Be that as it may, if one looks at games with a finite number of players, it is hardly 
relevant to consider shocks of this type because there are no longer any populations 
to apply them to. Instead, the stochastic fluctuations should be reflected directly 
on the stimuli that incite players to change their strategies: their payoffs. 

The particular stimulus-response model that we consider is simple enough: play- 
ers keep cumulative scores of their strategies' performance and employ exponentially 
more often the one that scores better. After a few preliminaries in section [2j this 
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approach is made precise in section[3]where we derive the stochastic rephcator equa- 
tion that governs the behaviour of players when their learning curves are subject 
to random perturbations. 

The replicator equation that we get is different from the "aggregate shocks" ap- 
proach of [9-12] and, as a result, it exhibits markedly different rationality properties 
as well. In stark contrast to the results of [10, 11], we show in section |4] that domi- 
nated strategies become extinct irrespective of the noise level. In fact, by induction 
on the rounds of elimination of dominated strategies, we show that this is true even 
for iteratively dominated strategies: despite the noise, only rationally admissible 
strategies can survive in the long run. 

We then begin addressing the issue of equilibrial play in section [5] by making a 
suggestive detour in the land of congestion gamesQ If the noise is relatively mild 
with respect to the rate with which players learn, we find that the game's poten- 
tial is a Lyapunov function which ensures that strict equilibria are stochastically 
attracting; and if the game is dyadic (i.e. players only have two choices), we can 
drop this assumption altogether. 

Encouraged by the results of section [U we attack the general case in section |6l 
As it turns out, strict equilibria arc always asymptotically stochastically stable in 
the perturbed replicator dynamics that stem from exponential learning. This begs 
to be compared to the results of [11, 12] where it is the equilibria of a suitably 
modified game that are stable, and not necessarily those of the actual game being 
played. Fortunately, exponential learning seems to give players a clearer picture of 
the original game and there is no need for similar modifications in our case. 

Notational Conventions. Given a finite set A = {uq ...«„}, we will routinely 
identify the set A (A) of probability measures on A with the standard n-dimensional 
simplex of W^+^: A{A) = {x (E M"+^ ■ T,a^a = I and Xa > 0}. Under this 
identification, we will also make no distinction between the elements a of A and 
the vertices of A(A), unless doing so would cause undue confusion. 

To streamline our presentation, we will consistently employ Latin indices for 
players k . . .) and Greek for their strategies (a, l3,fi...), separating the two by 
a comma when it would have been aDsthetically unpleasant not to. In like manner, 
when we have to discriminate between strategics, we will assume that indices from 
the first half of the greek alphabet start at (a, /3 = 0, 1, 2 . . .) while those taken 
from the second half start at 1 {fi, = 1,2, . . .). 

Finally, if X{t) is some stochastic process in R" starting at X{0) = x, its law 
will be denoted by Px-.x or simply by if there is no danger of confusion; and if 
the context leaves no doubt as to which process we are referring to, we will employ 
the term "almost surely" in place of the somewhat unwieldy "Pa;-almost surely" . 

2. Preliminaries 

2.1. Basic Facts and Definitions from Game Theory. As is customary, our 
starting point will be a (finite) set of N players, indexed by i e K = {1, . . . A^}. 
The players' possible actions are drawn from their strategy sets §i = {0, 1 . . . S'i — 1} 
and they can combine these (pure) strategies by choosing e Si with probability 
Pioi- In that case, the players' mixed strategies will be described by the points 

^Incidentally, this was our original motivation for considering randomly fluctuating payoffs: 
travel times and delays in traffic models are not determined solely by the players' choices but also 
by the fickle interference of nature. 
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Pi = {pifi,Pi.i . . .) G Ai := A(Si) or, more succinctly, by the strategy profile p = 
{pi, ■ ■ -Pn) G a := Hi Alternatively, if we wish to focus on the strategies of 
a particular player « G ^Nf against the ones of his opponents "N-i '■= 3Nf\{i}, we 
will employ the shorthand notation {p-i\ q) = {pi . . .q . . - Pn) to denote the profile 
where i plays (7 G A^ against his opponents' strategy p_i G A_i :~ Iljyi ^i- 

Now, once players have made their strategic choices, they will be rewarded ac- 
cording to the (multilinear) payoff functions : A — > K: 

(2.1) Ui{p) ^ . . M»,ai...awPl,ai- ' ■PN,aN 

where Mi,ai...Q„ is the reward of player i in the profile (ai . . . a^) G S = Yii ^i- i-'^- 
the payoff that strategy ai G §i yields to player i against the strategy a-i G §_i := 
Yij^i of opponents. Under this light, the payoff that a player receives when 
playing a pure strategy a G §i deserves special mention and will be given by: 

(2.2) Uia{p) Ui{p^i;a) = Ui{pi . . .a. . .pn). 

This collection of players i G 3Nf, their strategies ai G §i and their payoffs Ui will 
be our working definition for a game in normal form, usually denoted by 23 - or 
0(3\f, §, u) if we need to keep track of more data. 

Needless to say, rational players who seek to maximize their individual payoffs 
will avoid strategies that always lead to diminished payoffs against any play of their 
opponents. We will thus say that the strategy qi G A^ is (strictly) dominated by 
q'j^ G Ai and we will write qi -< q[ when 

(2.3) u,{p_i;qi) < Ui{p-i;q'i) 

for all strategies p_i G A_i of I's opponents 3\f_i3 

With this in mind, dominated strategies can be effectively removed from the 
analysis of a game because rational players will have no incentive to ever use them. 
However, by deleting such a strategy, another strategy (perhaps of another player) 
might become dominated and further deletions of iteratively dominated strategies 
might be in order (see section [4] for more details). Proceeding ad infinitum, we will 
say that a strategy is rationally admissible if it survives every round of elimination 
of dominated strategies. If the set of rationally admissible strategies is a singleton 
(e.g. as in the Prisoner's Dilemma), the game will be called dominance-solvable 
and the sole surviving strategy will be the game's rational solution. 

Then again, not all games can be solved in this way and it is natural to look for 
strategies which are stable at least under unilateral deviations. Hence, we will say 
that a strategy profile p G A is a Nash equilibrium of the game when 

(2.4) u^{p) > u,{p^i;q) for all g G A^, i G K 

If the equilibrium profile p only contains pure strategies a; G §i , we will refer to it 
as a pure equilibrium; and if the inequality (|2.4p is strict for all g 7^ G A;, z G J^, 
the equilibrium p will carry instead the characterization strict. Clearly, only pure 
profiles can satisfy the strict version of (|2.4p and therefore strict equilibria must also 
be pure. The converse implication is false but only barely so: a pure equilibrium 
fails to be strict only if a player has more than one pure strategics that return the 

^Thc adjective "strict" characterizes the inequality (12.31 : if the inequality is not strict, qi will 
be called weakly dominated by q'^ and we will write qi =^ g^. Because our primary interest lies in 
strictly dominated strategies, "dominated" should always be taken to mean "strictly dominated" . 
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same rewards. Since this occurrence has measure zero, we will relax our terminology 
somewhat and use the two terms interchangeably. 

To recover now the connection of equilibrial play with strategic dominance, note 
that if a game is solvable by iterated elimination of dominated strategics, the single 
rationally admissible strategy that survives will be the game's unique strict equilib- 
rium. But the significance of strict equilibria is not exhausted here: strict equilib- 
ria arc exactly the evolutionarily stable strategics of multi-population evolutionary 
gamesH Moreover, as we shall see a bit later, they are the only asymptotically 
stable states of the multi-population replicator dynamics [8]. 

Unfortunately, not all game possess strict equilibria (Rock-Paper-Scissors is the 
typical counterexample). Nevertheless, pure equilibria do exist in many large and 
interesting classes of games, even when we leave out dominance-solvable ones. Per- 
haps the most noteworthy such class is that of congestion games: 

Definition 2.1. A game © = 0(!N, S,u) will be called a congestion game when: 

(1) all players i (z 7^ share a common set of facilities 3^ as their strategy set: 
§i = 9 for all i e 3\r; 

(2) the payoffs are functions of the number of players sharing a particular facil- 
ity: Wi,ai...Q...Q„ = Ua{Na) whcrc = J2j^aja is the number of players 
choosing the same facility as i. 

Amazingly enough, it turns out that these games are equivalent to the class of 
potential games, first studied by Monderer and Shapley in [13]: 

Definition 2.2. A game © = ©(K, §,u) will be called a potential game if there 
exists a function 1/ : A R such that: 

(2.5) Ut{p-i; qi) - Ut{p-i; q'i) = -(F(p-j; g^) - F(p-»; g-)) 
for all players i £ and all strategies p-i £ A_i, q,;, q[ E A,;. 

This equivalence reveals that both classes of games possess equilibria in pure 
strategies: it suffices to look at the vertices of the face of A where the (necessarily 
multilinear) potential function V is minimised. 

2.2. Learning, Evolution and the Replicator Dynamics. As one would ex- 
pect, locating the Nash equilibria of a game is a rather complicated problem that 
requires a great deal of global calculations, even in the case of potential games 
(where it reduces to minimising a multilinear function over a convex polytope). 
Consequently, it is of interest to see whether there are simple and distributed learn- 
ing schemes that allow players to arrive at a reasonably stable solution. 

One such scheme is based on an exponential learning behaviour whcrc players 
play the game repeatedly and keep records of their strategies' performance. In more 
detail, at each instance of the game all players i € update the cumulative scores 
Uia of their strategies a £ 8^ as specified by the recursive formula: 

(2.6) [/„(< + 1) = U,a{t) + uMt)) 

where p{t) e A is the players' strategy profile at the t-th iteration of the gamcQ 
These scores reinforce the perceived success of each strategy as measured by the 



This is not true in the single-population case: for example, there exists a fully mixed evolu- 
tionarily stable strategy in the Hawk-Dove game. 

^In the absence of initial bias, we assume that Uic,{0) = for all i G 3^", a G S^. 
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average payoff it yields and hence, it stands to reason that players will lean towards 
the strategy with the highest score. The precise way in which they do that is by 
playing according to the namesake exponential law: 

(2-7) pUt + l)=y ^^j^. 

For simplicity, we will only consider the case where players update their scores 
in continuous time, i.e. according to the coupled equations: 

(2.8a) dUia{t) = Uia{x{t))dt 



(2.8b) x^t)- 

Then, if we differentiate (j2.8b|) to decouple it from (j2.8ap . we obtain the standard 
(multi-population) replicator dynamics: 

(2.9) —j^ = Xia {uia{x) ~ Xjfj Mi/3(x)) = Xia{uia{x) - Ui{x)) . 

Alternatively, if players learn at different speeds as a result of varied stimulus- 
response characteristics, their updating will take the form: 

(2-10) x,„(i) = ^^^,.^,,(,) 

where represents the learning rate of player i, i.e. the "weight" which he assigns 
to his perceived scores Uia . In this way, the replicator equation evolves at a different 
time scale for each player, leading to the rate-adjusted dynamics: 

(2.11) ^^^^^ ^ X,Xia{uia{x) ~ Ui{x)). 

Naturally, the uniform dynamics (|2.9p are recovered when all players learn at the 
"standard" rate Ai = 1. 

If we view the exponential learning model (|2.7[) from a stimulus-response angle, 
we see that that the payoff of a strategy simply represents an (exponential) propen- 
sity of employing said strategy. It is thus closely related to the algorithm of logistic 
fictitious play [4] where the strategy Xi of (j2.10p can be seen as the (unique) best 
reply to the profile X-i in some suitably modified payoffs Vi{x) = Ui{x) -\- j-H(xi). 
Interestingly enough, H[xi) turns out to be none other than the entropy of a;,;: 

(2.12) H{xi) = XiplogXip. 

P:Xift>0 

That being so, we deduce that the learning rates Xi act the part of (player-specific) 
inverse temperatures: in high temperatures (small Xi), the players' learning curves 
are "soft" and the payoff differences between strategies are toned down; on the 
contrary, if Ai — > oo the scheme "freezes" to a myopic best-reply process. 

Now, the replicator dynamics were first derived in [5] in the context of popula- 
tion biology: first for different phenotypes within a single species (single-population 
models) and then for different species altogether (multi-population models; [8] 
and [14] provide excellent surveys). In both these cases, one begins with large pop- 
ulations of individuals that are programmed to a particular behaviour (e.g. fight 
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for "hawks" or flight for "doves") and matches them randomly in a game whose 
payoffs directly affect the reproductive fitness of the individual players. 

More precisely, let Zia{t) be the population size of the phcnotype (strategy) 
a € §i of species (player) i £ 3^ in some multi-population model where individuals 
are matched to play a game (5 with payoff functions Ui. Then, the relative frequency 
(share) of a will be specified by the population state x — {xi . . .xn) G A where 
Xia = Zia/ Zi/s- So, if N individuals arc drawn randomly from the N species, 
their expected payoffs will be given by Ui{x), i £ UnT, and if these payoffs represent 
a proportionate increase in the phenotype's fitness (measured as the number of 
offsprings in the unit of time), we will have: 

(2.13) dz,a{t) ^ Zia{t)uia{x{t))dt. 

As a result, the population state x{t) will evolve according to: 

.\ dxia 1 dzia \ ^ •^a dZiu / , ^ / \\ 

dt Y.^3^^fi dt ;^E/3^»/3 dt 

which is exactly (|2.9p viewed from an evolutionary perspective. 

On the other hand, we should note here that in single-population models the re- 
sulting equation is cubic and not quadratic because strategies are matched against 
themselves. To be specific, assume that individuals are randomly drawn from a large 
population and are matched against one another in a (symmetric) 2-player game 25 
with strategy space S = {1, . . . S*} and payoff matrix u — {uq,^}. Then, if Xa denotes 
the population share of individuals that are programmed to the strategy a € §, their 
expected payoff in a random match will be given by Ua{x) := u^pxp = u{a, x); 
similarly, the population average payoff will be u{x, x) = Ua{x). Hence, follow- 
ing the same procedure as above, we get the single-population replicator dynamics: 

(2.15) ^ XaiUa{x) ~ u{x,x)) 

which behave quite differently than their multi-population counterpart (j2.14p . 

As far as rational behaviour is concerned, the replicator dynamics have some 
far-reaching ramifications. If we focus on multi-population models, Samuclson and 
Zhang showed in [7] that the share Xia{t) of a strategy a G Si which is strictly 
dominated (even iteratively) converges to zero along any interior solution path of 
p.9P : in other words, dominated strategies become extinct in the long run. Addi- 
tionally, there is a remarkable equivalence between the game's Nash equilibria and 
the stationary points of the replicator dynamics: the asymptotically stable states of 
\2. 9\) coincide precisely with the strict Nash equilibria of the underlying game [8]|_| 



2.3. Elements of Stability Analysis. A large part of our work will be focused on 
examining whether the rationality properties of exponential learning (elimination 
of dominated strategies and asymptotic stability of strict equilibria) remain true 
in a stochastic setting. However, since asymptotic stability is (usually) too strin- 
gent an expectation for stochastic dynamical systems, wc must instead consider its 
stochastic analogue. 



^An important observation to keep in mind is that this is not the case in the dynamics 1 12.151 . 
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That being the case, let W{t) = {Wi{t) . . . Wn{t)) be a standard Wiener process 
in R" and consider the stochastic differential equation (SDE): 

(2.16) dXa^it) = ba,{X{t))dt + aa,p{X{t)) dWf3{t). 

Following [15, 16], the notion of asymptotic stability in this SDE is expressed by: 

Definition 2.3. We will say that q G R" is stochastically asymptotically stable 
when, for every neighbourhood U oiq and every e > 0, there exists a ncigbhourhood 
14 of 5 such that: 

(2.17) lx{t) e U for aU t > 0, lim X{t) = q] > I - e 
for aU initial conditions X{0) = xeV^ of the SDE ^J^. 

Much the same as in the deterministic case, stochastic asymptotic stability is 
often established by means of a Lyapunov function. In our context, this notion 
hinges on the second order differential operator that is associated to the equation 
p.l6p . namely the generator L of X{t): 

Q=l a, 13=1 ' 

The importance of this operator can easily be surmised from Ito's lemma; indeed, 
if / : R" ^ M is sufficiently smooth, the generator L simply captures the drift of 
the process Y{t) = !{X(t)): 

df 



(2.19) dYit)^LfiX{t))dt + }_^^ 

a,/3 



ac,i3{X{t))dWp{t). 

X{t) 



In this way, L can be seen as the stochastic version of the time derivative ^; this 
analogy then leads to: 

Definition 2.4. Let q £ R" and let U be an open neighbourhood of q. We will 
say that / is a (local) stochastic Lyapunov function for the SDE (j2.16p if: 

(1) f{x) > for all X e [/, with equality iS x ~ q; 

(2) there exists a constant fc > such that Lf{x) < —kf{x) for all x E U. 

Accordingly, whenever such a Lyapunov function exists, it is known that the 
point q £ R" where / attains its minimum will be stochastically asymptotically 
stable - for example, see theorem 4 in pages 314-315 of [15]. 

3. Learning in the Presence of Noise 

Of course, it could be argued that the rationality properties of the exponential 
learning scheme are a direct consequence of the players' receiving accurate informa- 
tion about the game when they update their scoreso However, this is a requirement 
that cannot always be met: the interference of nature in the game or imperfect read- 
ings of one's utility invariably introduce fluctuations in ()2.8ap . and in their turn, 
these lead to a perturbed version of the replicator dynamics 



^We should note here that this information has to be accurate but not necessarily global (as in 
the case of fictitious play). Just as in regret-matching schemes [17], players only need to "know 
the game", i.e. the payoff that they would have received if they had chosen a strategy other than 
the one that they actually played. This information is usually easier to acquire than the empirical 
distribution of play: e.g. the received payoffs actually sufHce in minority games [18]. 
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To account for these random perturbations, we will assume that the players' 
scores are now governed instead by the stochastic differential equation: 

(3.1) dU,a{t) = U,a{X{t))dt + TJ,c.{Xit))dW,c.{t) 

where, as before, the strategy profile X{t) G A is given by the exponential law: 

(3.2) X,^{t) 



and W{t) is a standard Wiener process living in Hi'^'^'- The difference from the 
deterministic case obviously lies in the additive noise term r]ia{X{t)) dWia(t) where 
the coefficients T]ia measure the impact of the noise on the payoffs. Of course, since 
this impact might depend on the state of the game, these coefficients may well 
depend on the strategy profile X{t) themselves; the only assumption that we will 
make is that they be continuous on aQ In particular, if rjia{x-i;a) — for all 
i G J^i a € SiyX-i € A_i, equation p.ip becomes a convincing model for the case 
of insufficient information. It states that when a player actually uses a strategy, his 
payoff observations are accurate enough; but with regards to strategies he rarely 
employs, his readings could be arbitrarily off the mark. 

At any rate, to decouple equations p.l|) and p.2p . we can simply apply Ito's 
lemma to the process X{t). Indeed, since dWjp- dWk^ = SjkSjj-ydt (recall that 
W{t) has independent components across both players and strategies), we get: 



dXj„ — 



E 



dU. 



(3.3) 



/3 dUjp 



313 



2 Ej,fc E 



-/3,7 dU.fjdUky 

d^Xia 



dUjp- dUk-Y 



dx,, 



03 



■dW, 



If 



dt 



^0 ' dU^fi 
and, after a few more calculations: 
(3.4) dX,^ = X„ Ka(X) - u,{X)\ dt 



■X,,. 



Xj, 



1 



-Tlt{X){l - 2X,„) - - ^ r^i^,{X)X,p{l - 2X,p) 



77„(X)dW^„ 



2 

riil3{X)Xii3 dWip 



dt 



On the other hand, if players update their strategies with different learning rates 
Ai, we should instead apply Ito's formula to equation (j2.10p . In so doing, we obtain: 

(|3.4[) dXict ~ \iXic^ \^ict 

{X)-u,[X)\dt 

f3 dW,p 



2X 



dt 



hc.{x)dt 



(Ji,al3iX)dWii3. 



In fact, even this assumption can be relaxed: it suffices for rji^ to be measurable and bounded 
on A (perhaps after redefinition on a null set). As it turns out, the particular form of the 
coefficients is not important and all that matters is their worst value: as long as they are bounded 
(which is more than reasonable from a practical point of view), our results will not be affected. 
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where, in obvious notation, bia{x) and ai^apix) = XiXiar]i[3{x){Sai3 — xp),x G A 
are respectively the drift and diffusion coefficients of the diffusion X{t). Obviously, 
when Ai = 1, we recover the uniform dynamics (j3.4p : equivalently (and this is 
an interpretation that is well worth keeping in mind), the rates A; can simply be 
regarded as a commensurate inflation of the payoffs and noise coefficients of player 
i S 3\f in the uniform logistic model (j3.2p . 

Equation (|3.4p and its rate-adjusted sibhng (|3.4I[) will constitute our stochastic 
version of the replicator dynamics and thus merit some discussion in and by them- 
selves. First, note that these dynamics admit a (unique) strong solution for any 
initial state X(0) = a; G A, even though they do not satisfy the linear growth con- 
dition \b{x) \ + \(j{x)\ < C(f -f- \x\) that is required for the existence and uniqueness 
theorem for SDE's (e.g. theorem 5.2.1 in [19]). Instead, an addition over a G §i re- 
veals that every simplex A^ C A remains invariant under (j3.4[): if Xi{0) — Xi £ Ai, 
then d (^^ Xia) = and hence, Xi{t) will stay in Aj for all t|3 So, if is a smooth 
bump function that is equal to 1 on some open neigbourhood of U Z) A and which 
vanishes outside some compact set K D U, the SDE 

(3.5) dX„ = q^iX) (b,^{X) dt + dW,p) 

will have bounded diffusion and drift coefficients and will thus admit a unique strong 
solution. But since this last equation agrees with ()3.4|1 on A and any solution of (|3.4|) 
always stays in A, we can easily conclude that our perturbed replicator dynamics 
admit a unique strong solution for any initial X{0) = x Cz A. 

It is also important to compare the dynamics (|3.4p . (|3.4l|l with the "aggregate 
shocks" approach of Fudcnberg and Harris [9] that has become the principal in- 
carnation of the replicator dynamics in a stochastic environment. So, let us first 
recall how aggregate shocks enter the replicator dynamics in the first place. The 
main idea is that the reproductive fitness of an individual is not only affected by 
deterministic factors but is also subject to stochastic shocks due to the "weather" 
and the interference of nature with the game. More precisely, if Zia{t) denotes the 
population size of phenotype a £ S, of the species i G in some multi-population 
evolutionary game 25, its growth will be determined by: 

(3.6) dZ,a{t) = Z,^{t){u,^iX{t)) dt + 77,„ dW,c.{t)) 

where, as in (|2.13p . X{t) £ A denotes the population shares Xia = Zia/ J^/j ^il3- 
In this way, Ito's lemma yields the replicator dynamics with aggregate shocks: 



(3.7) dX,^ = [{u^X) - u,{X)) - (r^lX^o, - r^f^Xf^) 



dt 



Xia 



We thus see that the effects of noise propagate differently in the case of exponen- 
tial learning and in the case of evolution. Indeed, if we compare equations (j3.4p and 
(|3.7p term by term, we see that the drifts are not quite the same: even though the 
payoff adjustment Uia — ut ties both equations back together in the deterministic 
setting (77 = 0), the two expressions differ by 



(3.8) X„ 



1 2 1 \ " 2 V 



dt. 



^Actually, it is not harder to see that every face of A is a trap for X{t). 
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Innocuous as this term might seem, it is actually crucial for the rationality prop- 
erties of exponential learning in games with randomly perturbed payoffs: as we 
shall see in the next sections, it leads to some miraculous cancellations that allow 
rationality to emerge in all noise levels. 

Moreover, this difference suggests that we can shift from (|3.4[) to (|3.7p simply 
by modifying the game's payoffs to Uia = Uia + ^jy^^jH This modified game was 
precisely the one that came up in the analysis of [11,12] and it plays a pivotal role in 
setting apart learning and evolution in a stochastic setting. In effect, whereas this 
modification seems deeply ingrained in the process of natural selection, exponential 
learning gives players a clearer picture of the actual underlying game. 



Thereby armed with the stochastic replicator equations (|3.4p . (|3.4ip to model 
exponential learning in noisy environments, the logical next step is to see if the 
rationality properties of the deterministic dynamics carry over to this stochastic 
setting. In this direction, we will first show that dominated strategies always be- 
come extinct in the long run; only the rationally admissible ones survive. 

As in [10] (implicitly) and [11] (explicitly), the key ingredient of our approach 
will be the cross entropy between two mixed strategies qi,Xi € Ai of player z € ]\f: 



Q:giQ>0 

where H{qi) = — qta log qia is the entropy of qi and oJkl is the intimately related 
Kullback-Leibler divergence (or relative entropy): 



This divergence function is central in the stability analysis of the (deterministic) 
replicator dynamics because it serves as a distance measure in probability space [8] . 
As it stands however, o?kl is not a distance function per se: neither is it symmetric, 
nor does it satisfy the triangle inequality. Still, it has the very useful property 
that dKhiqi^Xi) = oo iff cc^ does not employ a pure strategy a e that is present 
in qi. Therefore, if d}<ii^{qi, Xi) = oo for all dominated strategics qi of player i, it 
immediately follows that Xi cannot be dominated itself. In this vein, we have: 

Proposition 4.1. Let X{t) be a solution of the stochastic replicator dynamics Jg.^p 
for some interior initial condition X{0) ^ x Cz Int(A). Then, if qi G is (strictly) 
dominated: 

(4.3) lim d}<ii^{qi, Xi{t)) = oo almost surely. 

t — >oo 

In particular, if qi = a € §i is pure, we will have limt^oo Xia{t) ~ (a.s.): strictly 
dominated strategics do not survive in the long run. 

Proof. Note first that X{Q) = x ^ Int(A) and hence, Xi{t) will almost surely stay 
in Int(Ai) for all t > Q; this is a simple consequence of the uniqueness of strong 
solutions and the invariancc of the faces of A^ under the dynamics p.4p . 



■^Strictly speaking, this presumes that the noise coefRcicnts ryi^ be constant; the general case 
requires us to allow for games whose payoffs may not be multilinear. This would not really change 
our results but, for the time being, we prefer to avoid this complication. 



4. Extinction of Dominated Strategies 



(4.1) 




(4.2) 
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Let US now consider the cross entropy Gq - (t) between qi and Xi (t) : 

(4.4) G,.(t) = H{q,,X,{t)) = -y q,^\ogX,^{t). 

As a result of Xi(t) being an interior path, Gq^{t) wiU remain finite for aU i > 
(a.s.). So, by applying Ito's lemma we get: 



and, after substituting dXi/^ from the dynamics (|3.4p . this last equation becomes: 

(4.6) dG,. =^^<z,^ «,(X)-ii,/3(X) + i^^r/2_^(X)X,^(l-X,^) 

+ Xl^'^^*'^ ~ A^^)77,^(X) dW,^. 

Accordingly, if q'^ G is another mixed strategy of player i, we readily obtain: 

(4.7) dGq, - dGq, = {u,[X_,-q'i) - u,{X_,-q{)) dt + Y.p^1'^P - ^'^(^) dW^p 
and, after integrating: 

(4.8) Gq^_q,{t)^ f udX^,{s)-q[^q,)ds + Y.{q,p~q[p)[ ri,p{X{s)) dW,0{s) 

Jo P Jo 

Suppose then that qi -< q'^ and let Vi ~ inf{ui(a;_i; q'^ — qi) : X-i G A_i}. With A^^ 
compact, it easily follows that Vi > and the first term of (|4.8p will be bounded 
from below by Vit. 

On the other hand, since monotonicity fails for Ito integrals, the second term 
must be handled with more care. To that end, let $i(s) = X]/3(9i/3 ~ lip) VipiXis)) 
and note that the Cauchy-Schwarz inequality gives: 

(4.9) e.'(s) < S., Y,^illf3 - q^pfll^p{X{s)) < S.,ri Y.^{qp - q'^f < 2S,ii 

where Si = |§i| is the number of pure strategies available to player i and rji = 
max{|77i/3(a;)| : a; e A,/? S §i}0 recall also that qi.,q'i € A.; for the last step. 
Therefore, if ^/'i(t) ~ /g S,i{s) ds denotes the martingale part of (|4.7p and pi{t) is its 
quadratic variation, the previous inequality becomes: 

(4.10) pdt) = [tP,,i;,]{t)= f ^f{s)ds<2S,r,ft. 

Jo 

However, by the time-change theorem for martingales (e.g. theorem 3.4.6 in [20]), 
there exists a Wiener process Wi such that ipi{t) = Wi{pi{t)). Hence, by the law of 



^'^If the coefficients ri^ij are not continuous but only bounded (perhaps after redefinition on a 
null set), one should take the essential supremum instead and set T]i = ess sup{77i^}. 
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the iterated logarithm we get: 

hminf Gq^^q' (t) > liminf (H{qi - q[,x) + v^t + W^{p^{t))) 

> liminf [v^t - \/2pi{t) log log pi{t) 



(4.11) > liminf v,t - 2?^, J S,t\oglog{2 S^r]ft) = oo (a.s.) 



Since Gq-{t) > Gq-{t) — Gq'\t) oo, it follows that Yimt-^oo d-KL{qi, Xi{t)) = oo 
(almost surely) and, with Ga{t) = — log Xia{t) for all pure strategies a £ §i, our 
proof is complete. □ 

As in [11], we can now obtain the following estimate for the lifespan of pure 
dominated strategies: 

Proposition 4.2. Let X(t) be an interior solution path of with initial condi- 

tion X{0) — X G Int(A) and let Px denote its law. Assume further that the strategy 
a £ §i is dominated; then, for any M > and for t large enough, we have: 

An \ 1 „„r„ ( ^ - Vit 



(4.12) Px {X.^it) < e~''} > -erfc ^ ^^^^^ 

where Si = \§i\ is the number of strategies available to player i, rji = max{|77i^(j/)| : 
2/ e A, /3 G §i} and the constants Vi > and hi(xi) do not depend on t. 

Proof. The proof is pretty straightforward and for the most part follows [11]. Surely 
enough, if a ^ Pi S and we use the same notation as in the proof of proposition 
14.11 we have: 

-\ogX,c.{t) = Ga(t) > Gc,{t) ~ Gp^{t) > H{a,x) - H{p,,x) + v,t + %{p,{t)) 

(4.13) = h,{x^) + v,t + WMt)) 

where Vi := m\TLx_i{u^{x-i■,pi) ~ Ui{x^i;a)} and hi{xi) := logXjQ - Pj^* log 
are both positive. Then: 

Px{XUt) < e-^') > Px {wMt)) > M - h,{x,) - v,t^ 

(4.14) ='-.J''~'^;^'A 

and, since the quadratic variation pi{t) is bounded above by 2Sirift (eq. (|4.10|) ). the 
estimate (|4.12p holds for all sufficiently large t (i.e. such that M < hi{xi)+Vit). □ 

Some remarks are now in order: first and foremost, our results should be con- 
trasted to those of Cabralcs [10] and Imhof [11] where dominated strategies die out 
only if the noise coefficients (shocks) rjia satisfy certain tameness conditions. The 
origin of this notable difference is the form of the replicator equation p.4p and, in 
particular, the extra terms that are propagated there by exponential learning and 
which are absent from the aggregate shocks dynamics (|3.7p rH As can be seen from 
the derivations in proposition l4.1|, these terms are precisely the ones that allow play- 
ers to pick up on the true payoffs Uia instead of the modified ones Uja = + ^rjf^ 
that come up in [11,12] (and, indirectly, in [10] as well). 



^^It should be noted here again that the single-population dynamics studied in [11] are even 
further differentiated from 113.41 1 by the fact that they are cubic and not quadratic. 
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Secondly, it turns out that the way that the noise coefficients 77^/3 depend on the 
profile a; <S A is not really crucial: as long as r]ip{x) is continuous (or essentially 
bounded), our arguments arc not affected. The only way in which a specific depen- 
dence influences the extinction of dominated strategics is seen in proposition 14.21 
a sharper estimate of the quadratic variation of riip(X(s)) ds could conceivably 
yield a more accurate estimate for the cumulative distribution function of (j4.12p . 

Finally, it is only natural to ask if proposition 14. II can be extended to strategies 
that are only iteratively dominated. As it turns out, this is indeed the case: 

Theorem 4.3. Let X{t) be an interior solution path of starting at X{0) = 

X S Int(A). Then, if Qi G is iteratively dominated: 

(4.15) lim dKhiqi, Xi{t)) ~ 00 almost surely, 

t — *oo 

i.e. only rationally admissible strategies survive in the long run. 

Proof. As in the deterministic case [7], the main idea is that the solution path X(t) 
gets progressively closer to the faces of A that are spanned by the pure strategies 
which have not yet been eliminated. Following [10], we will prove this by induction 
on the rounds of elimination of dominated strategies; proposition 14. 1 1 is simply the 
case n ~ 1. 

To wit, let Ai C Ai, A^i C A_i and denote by Adm{Ai, A^i) the set of strategies 
qi G Ai that are admissible (i.e. not dominated) with respect to any strategy 
q-i e A-i. So, if we start with = A,; and A^_^ = Ilj/i-^j' define 
inductively the set of strategies that remain admissible after n elimination rounds by 
yif := Adm(yi^-\yi!^-^) where .A""^ := Hj-^j-^i'^ similarly, the pure strategies 
that have survived after n such rounds will be denoted by S" := Si flj^". Clearly, 
this sequence forms a descending chain A^ A\ . . . and the set Af := Ho" 
will consist precisely of the strategies of player i that are rationally admissible. 

Assume then that the cross entropy Gq.{t) = H{qi, Xi{t)) = — qia log Xia{t) 
diverges as t — > 00 for all strategies qi ^ A'^ that die out within the first k rounds; 
in particular, if a ^ this implies that Xia{t) — > as < — > 00. We will show that 
the same is true if qi survives for k rounds but is eliminated in the subsequent one. 

Indeed, if qi G A'l but qi ^ A'i'^^ , there will exist some q[ G A'I^^ such that: 

(4.16) Ui{x-i; q[) > Ui{x-i]qi) for aU X-i G A't^. 

Now, note that any x-i G A_i can be decomposed as X-i — x^^+x'^l^ where x^™ 
is the "admissible" part of X-i, i.e. the projection of X-i on the subspace spanned by 
the surviving vertices S^lj = Wj^i Hence, if Vi = min{ui(a_i; q[) — Ui{a-i\ qi) : 
a^i G §!lj;}, we will have > and, by linearity: 

(4.17) u,ixt^r; q'i) - u.ix'^-r: %) > > O, for all G A_,. 

Moreover, by the induction hypothesis, we also have X'^™{t) ^ as i ^ 00. Thus, 
there exists some t^ such that: 

(4.18) |«,(XlrW,'?0 - u,{X'^T{t),q[)\ < vj2 

for all t > to (recall that X'^°'^{t) is spanned by already eliminated strategies). 
Therefore, as in the proof of proposition 14.11 we obtain for t > to: 

(4.19) G,,(t) - G,,{t) >M+ ^v,t + Y.^{q[p " q^^3) £ V^p{X{s)) dW.p{s) 
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where M is a constant depending only on to- In this way, the same reasoning as 
before gives hnit^oo Gq^ (t) = oo and the theorem follows. □ 

As a result, if there exists only one rationally admissible strategy, we get: 

Corollary 4.4. Let X{t) be an interior solution path of the replicator equation 
jg.^p for some dominance-solvable game (5 and let € § &e the (unique) strict 
equilibrium of . Then: 

(4.20) lim X{t) = xq almost surely, 

t — >oo 

i.e. players converge to the game's strict equilibrium (a.s.). 

In concluding this section, it is important to note that all our results on the 
extinction of dominated strategies remain true in the adjusted dynamics (|3.4I[) as 
well: this is just a matter of rescaling. The only difference from using different 
learning rates Xi comes about in proposition 14. 21 where the estimate (|4.12p becomes 

j^j^ ^ 1 ( M - hi{xi) - XiVit 



(4.21) P.{X,^{t) <€-''}> -CTic 

2 \ 2\iijiy/bit 

As it stands, this is not a significant difference in itself because the two estimates 
are asymptotically equal for large times. Nonetheless, it is this very lack of contrast 
that clashes with the deterministic setting where faster learning rates accelerate the 
emergence of rationality. The reason for this gap is that an increased learning rate 
Xi also carries a commensurate increase in the noise coefficients r]i , and thus deflates 
the benefits of accentuating payoff differences. In fact, as we shall see in the next 
sections, the learning rates do not really allow players to learn any faster as much 
as they help diminish their shortsightedness: by effectively being lazy, it turns out 
that players are better able to average out the noise. 

5. Congestion Games: a Suggestive Digression 

Having established that irrational choices die out in the long run, we turn now 
to the question of whether cquilibrial play is stable in the stochastic replicator 
dynamics of exponential learning. However, before tackling this issue in complete 
generality, it will be quite illustrative to pay a visit to the class of congestion games 
where the presence of a potential simplifies things considerably. In this way, the 
results we obtain here should be considered as a motivating precursor to the general 
case analysed in section [6l 

5.1. Congestion Games. To begin with, it is easy to see that the potential V 
of definition 12.21 is a Lyapunov function for the deterministic replicator dynamics. 
To wit, assume that player i € 3Nf is learning at a rate A; > and let x{t) be a 
solution path of the rate-adjusted dynamics (j2.1ip . Then, a simple differentiation 
of V{x{t)) gives: 

dV sr-^ dV dxia / N ^ / /■ \ / \^ 

= 2^ -Q^ ^ = - ^Uia(X) XiXia[Uia[X) - Ut(x)) 

i,a i.a 

(5.1) = - A, x,c.ulix) - uUx)) < 

where the last step stems from Jensen's inequality - recall that = —Uiai^) on 
account of equation p.Sp and also that Ui(x) = J2a''^iaUia{x). In particular, this 
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implies that the trajectories x(t) are attracted to the local minima of V, and since 
these minima coincide with the strict equilibria of the game, we painlessly infer 
that strict cquilibrial play is asymptotically stable in (|2.1ip F^ 

It is therefore reasonable to ask whether similar conclusions can be drawn in 
the noisy setting of (|3.4I[) . Mirroring the deterministic case, a promising way to 
go about this question is to consider again the potential function V of the game 
and try to show that it is stochastically Lyapunov in the sense of definition 1 2. 
Indeed, if qo = {eifl, . . . ejv,o) S A is a local minimum of V (and hence, a strict 
equilibrium of the underlying game) , we may assume without loss of generality that 
V{qo) = so that V{x) > in a neighbourhood of qq. We are thus left to examine 
the negativity condition of definition [231 i-e- whether there exists some fc > such 
that LV{x) < —kV{x) for all x sufficiently close to go- 
To that end, recall that = —uia and that^^ — 0. Then, the generator L 
of the rate-adjusted dynamics (|3.4ip applied to V produces: 

LV{x) = - \iXiaUia{x)iuia{x) - Ui{x)) 

where, for simplicity, we have assumed that the noise coefficients r^ia are constant. 

So, let Ei > and consider the perturbed strategies Xi = {\ — ei)ei^o + ^iVi with 
Hi belonging to the face of Aj that lies opposite to Ci^ (i.e. > 0, /i = 1, 2 . . . 
and !)• After a series of calculations, we obtain: 

'^^XiaUia{x){uia{x) - Ui{x)) 

= £»'«?,o(9o) + £i Vit, {uif,{qo) - 2ui^(go)ui,o(9o)) + (e?) 
(5.3a) = (^"^m)' + (e') 

where Au;^ = Ui,o(go) - u^i{qo) > 0; also: 

(5.3b) =-£^1] y^^.^u^Avl + vlo) + {ej) ■ 

and, finally: 

(5.3c) Vix)=J2^e,Au,^ + 0{e^) 

where = J2i ^f- Therefore, if we combine equations (|5.3ap - (|5.3cp . the condition 
LV{x) < —kV{x) becomes: 



(5.4) AiEi^^y^^Au 



> fcV e,Ait,^ + 0(e^); 



^"^Ks mentioned before, to avoid unnecessary complications, we plead guilty to a slight abuse 
of terminology in assuming that all equilibria in pure strategies are also strict. 

^•^Strictly speaking, since our analysis is constrained on A, the "neighbourhoods" of definitions 
l2.3l and l2.4l should be taken to mean "neighbourhoods in A", i.e. neighbourhoods in the subspace 
topology of A ^ R'^' . This minor point should always be clear from the context. 
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and if Aui^ > ^{Vifj. + Vi,o) ^^i' M ^ §i\{0}: this last inequality will be satisfied 
for some fc > whenever e is small enough. Essentially, we have proven: 

Proposition 5.1. Let q ~ (ai . . . a^) be a strict equilibrium of a congestion game 
© with potential function V and assume that V{q) = 0. Assume further that the 
learning rates Xi are sufficiently small so that, for all jjL G §i\{ai} and all i S ^Nf; 

(5.5) V{q^,,^i)>^{r|l+ll^). 

Then q is stochastically asymptotically stable in the rate- adjusted dynamics |g.^[[ ). 

We thus sec that no matter how loud the noise r]i might be, stochastic stability 
is always guaranteed if the players choose a learning rate that is slow enough as 
to allow them to average out the noise (i.e. Ai < AVi/77^). Of course, it can be 
argued here that it is highly unrealistic to expect players to be able to estimate the 
amount of Nature's interference and choose a suitably small rate A,. On top of that, 
the very form of the condition ()5.5p is strongly reminiscent of the "modified" game 
of [11, 12], a similarity which seems to contradict our statement that exponential 
learning favours rational reactions in the original game. The catch here is that 
condition (|5.5p is only sufficient and proposition (|5.1[) merely highlights the role 
of a potential function in a stochastic environment. As we shall see in section |6l 
nothing stands in the way of choosing a different Lyapunov candidate and dropping 
requirement (|5.5p altogether. 



5.2. The Dyadic Case. To gain some further intuition into why the condition 
(|5.5p is redundant, it will be particularly helpful to examine the case where players 
compete for the resources of only two facilities (i.e. = {0,1}, z e J^) and try 
to learn the game with the help of the uniform replicator equation (j3.4p . This 
is the natural setting for the El Farol bar problem [21] and the ensuing minority 
game [18] where players choose to "buy" or "sell" and are rewarded when they are 
in the minority - buyers in a sellers' market or sellers in an abundance of buyers. 

As has been shown in [22], these games always possess strict equilibria, even 
when players have distinct payoff functions. So, by relabelling indices if necessary, 
let us assume that q^ = (ei^Oj • ■ • ^^,0) is such a strict equilibrium and set xi = Xi^. 
Then, the generator of the replicator equation (|3.4|1 takes the form: 

(5.6) L = J2,^ x,(l - X,) [Au^ix) + i(l - 2x,)r^f{x)] 

where now Aui = Ui Q — u^ i and r]f = rif^ + rf^^. 

It thus appears particularly appealing to introduce a new set of variables such 
that = Xi{l — Xi)-^\ this is just the "logit" transformation: yi ~ logitx,; = 
log j^^'^ . In these new variables, (|5.6p assumes the astoundingiy suggestive guise: 



which reveals that the noise coefficients can be effectively decoupled from the 
payoffs. We can then take advantage of this by letting L act on the function 
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= E.e-"* (a. >0): 

(5.8) Lf{y) = - (^"' - ^«*'?') ■ 

Indeed, if is chosen small enough so that Aui — ■^aiijf > nij > for all sufficiently 
large iji (recall that Aui{qo) > since qo is a strict equilibrium), we get: 

(5.9) Lf{y) < - V . a,m^e-''^y^ < -kf{y) 

^ — 

where k = mini{a,;7Ti,;} > 0. And since / is strictly positive for > and 
only vanishes as y — > oo (i.e. at the equilbrium qq), a trivial modification of the 
stochastic Lyapunov method (see e.g. pp. 314-315 of [15]) yields: 

Proposition 5.2. The strict equilibria of minority games are stochastically asymp- 
totically stable in the uniform replicator equation ^3^. 

Remark 5.2.1. It is trivial to see that strict equilibria of minority games will also be 
stable in the rate-adjusted dynamics (|3.4ip : in that case we simply need to choose 
a.i such that Aui — ^OiXir/f > > 0. 

Remark 5.2.2. A closer inspection of the calculations leading to proposition 15.21 
reveals that nothing hinges on the minority mechanism per se: it is ()5.7p that is 
crucial to our analysis and L takes this form whenever the underlying game is a 
dyadic one (i.e. |§,| = 2 for all i e ?^)0 In other words, proposition 15.21 also holds 
for all games with 2 strategies and should thus be seen as a significant extension of 
proposition 15.11 

Proposition 5.3. The strict equilibria of dyadic games are stochastically asymp- 
totically stable in the replicator dynamics of exponential learning. 

6. Stability of Equilibrial Play 

In deterministic environments, the "folk theorem" of evolutionary game the- 
ory provides some pretty strong ties between equilibrial play and stability: strict 
equilibria are asymptotically stable in the multi-population replicator dynamics 
(|2.9p [8]. In our stochastic setting, we have already seen that this is always true in 
two important classes of games: those that can be solved by iterated elimination of 
dominated strategies (corollary [44]) and dyadic ones (proposition [O])- 

Although interesting in themselves, these results clearly fall short of adding up 
to a decent analogue of the folk theorem for stochastically perturbed games. Nev- 
ertheless, they are quite strong omens in that direction and such expectations are 
vindicated in the following: 

Theorem 6.1. The strict equilibria of a game © are stochastically asymptotically 
stable in the replicator dynamics |5'.^[ ), Jg.^lp of exponential learning. 



-'^ ^Instead, if players have more than 2 strategies, the not-so-convenient analogue of I I5.7I I is 



df 



2 X — ^ 2 2 



where y^.o = logitXi Oi Viti = ^iiJ./{^ ~ ^i,o) and we are assuming that / does not depend on the 
yi/j, variables. 
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Before proving theorem 16.11 we should first take a shght detour in order to 
properly highlight some of the issues at hand. On that account, assume again 
that the profile qo = (ei,o, • ■ • , sn.o) is a strict equilibrium of 0. Then, if qo is to 
be stochastically stable, say in the uniform dynamics (|3.4p . one would expect the 
strategy scores Uifi of player i to grow much faster than the scores Ui/^i^fi e §i\{0} 
of his other strategies. This is captured remarkably well by the "adjusted" scores: 



(6.1a) Z,,o = A,[/,,o 

(6.1b) Z^^ = A«([/,^ 

where A.; > is a sensitivity parameter akin (but not identical) to the learning rates 
of equation (j3.4ip (the choice of common notation is fairly premeditated though) . 

Clearly, whenever Z^ o is large, Uifi will be much greater than any other score 
Uif^ and hence, the strategy € will be employed by player i far more often. To 
see this in more detail, it is convenient to introduce the variables: 

(6.2a) y,,o := e^' " 



(6.2b) 



.\iUi, 



where Yi^o is a measure of how close Xi is to Cj^o G and (^^^1,1^^2 . ■ .) S A"' " is 
a direction indicator; the two sets of coordinates are then related by the transfor- 
mation Yia = X^J^/ J2fj,^iili a £ §i, ^1 £ §i\{0}. Consequently, to show that the 
strict equilibrium qo ~ (ei,o, ■ • ■ eN,o) is stochastically asymptotically stable in the 
replicator equation (|3.4p . it essentially suffices to show that Yi^o diverges to infinity 
as t ^ oo with arbitrarily high probability. 

To that end, after some calculations in the Yia coordinates, Ito's lemma gives: 



(6.3a) dY^fi ^ XtY^fl 



Ui,o 



Y, 



A, 



2 



' ^ II. 



dt 



(6.3b) 



+ KYifl 

dYij^i — XiYi^ 

+ ^Y., 



Tli.o dWi.o - Vi^Y^fj^ dWif^ 

' ^ II. 



dt 



X,Y 



IfJ. 



ViuYii, dWi 



where we have suppressed the arguments of ui and 77, in order to reduce notational 
clutter. In fact, this last SDE is particularly revealing: roughly speaking, we see 
that if A; is chosen small enough, the deterministic term u^^o ~ ^i^Ui^ will 
dominate the rest (cf. with the "soft" learning rates of proposition [O]). And, since 
we know that strict equilibria are asymptotically stable in the deterministic case, 
it is plausible to expect the SDE (|6.3p to behave in a similar fashion. 



Proof of theorem \6.1\ Tying in with our previous discussion, we will establish sto- 
chastic asymptotic stability of strict equilibria in the dynamics p. 41) by looking at 
the processes Yi = (li,Oj^,i, ■ • ■) G x A'^'^^ of equation (|6.2p . In these coordi- 
nates, we just need to show that for every Mi > 0,i G 3\f and any e > 0, there 
exist Qi > Mi such that if ^1,0(0) > Qi, then, with probability greater than 1 — e. 
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linit^oo i^i,o(i) = oo and Yi,o{t) > Mi for all < > 0. In the spirit of the previous 
section, we will accomplish this with the help of the stochastic Lyapunov method. 

Our first task will be to calculate the generator of the diffusion Y = (Yi, . . . Yn), 
i.e. the second order differential operator: 

(6.4) L^Yl ^^'^'(y^ii- + 1 E (y)). 



where bi and ai are the drift and diffusion coefficients of the SDE (|6.3p respectively. 
In particular, if we restrict our attention to sufficiently smooth functions of the 
form f{y) = J2ie3^ My^o), the application of L yields: 



(6.5) Lfiy) = ^^y^.o 



Aj 2 \ " / •^i / I o ^ 2 



dfi 
dyifi 



2 , \ ^ 2 2 



Therefore, let us consider the function f{y) = J2i^/yi,o for > 0. With 
= -l/2/«^,o and = 2/yf o, equation ([631) becomes: 



(6.6) Lf{y) = -J2 — 



However, since go = (ei.Oi ■ • ■ gat^q) has been assumed to be a strict Nash equilibrium 
of 0, we will have Mi,o('?o) > Wi^((7o) for all /i G §i\{0}. Then, by continuity, there 
exists some positive constant Vi > with Ui^ — Ui^yi^ > Vi > whenever y^.o is 
large enough (recall that j/i^ = 1). So, if we set rji = max{|77i3(x)| : a; € A, /3 g 
Si} and pick positive Ai with Ai < Vi/rjf, we get: 

(6.7) Lf{y) <-Y^— < -imin,{A,:z;J/(2/) 

for all sufficiently large yi o- Moreover, / is strictly positive for yi Q > and vanishes 
only as y^ Q oo. Hence, as in the proof of proposition 15.21 our claim follows on 
account of / being a (local) stochastic Lyapunov function. 

Finally, in the case of the rate-adjusted replicator dynamics p. 41 p . the proof is 
similar and only entails a rescaling of the parameters Ai. □ 

Remark 6.1.1. If we trace our steps back to the coordinates Xi^, our Lyapunov 

candidate takes the form f{x) = J2i (^i~o' S^^i)i)- It thus begs to be compared 

to the Lyapunov function employed by Imhof and Hofbauer in [12] to derive 

a conditional version of theorem l6.1l in the evolutionary setting. In fact, the obvious 
extension f{x) = x^^ also works in our case, but the calculations are much 

more cumbersome and they are also shorn of their ties to the adjusted scores (j6.ip . 

Remark 6.1.2. It is also important to highlight the dual role that the learning 
rates Ai play in our analysis. In the logistic learning model (|2.10p they measure the 
players' convictions and how strongly they react to a given stimulus (the scores Uia); 
in this role, they are fixed at the outset of the game and form an intrinsic part of the 
replicator dynamics (|3.4I|1 . On the other hand, they also make a virtual appearance 
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as free temperature parameters in the adjusted scores (|6.1|) . to be softened until we 
get the desired result. For this reason, even though theorem 16.11 remains true for 



any choice of learning rates, the function f{x) = ^j^o' •'^i/i Lyapunov only 
if the sensitivity parameters Xi are small enough. It might thus seem unfortunate 
that wc chose the same notation in both cases, but we feci that our decision is 
justified by their intimate relation. 

7. Discussion 

Our aim in this last section will be to discuss a number of important issues that 
we have not been able to address thoroughly in the rest of the paper; truth be told, 
a good part of this discussion can be seen as a roadmap for future research. 

Ties with Evolutionary Game Theory. In single-population evolutionary mod- 
els, an evolutionarily stable strategy (ESS) is a strategy which is robust against 
invasion by mutant phenotypes [3]. Strategies of this kind can be considered as a 
stepping stone between mixed and strict equilibria and they are of such significance 
that it makes one wonder why they have not been included in our analysis. 

The reason for this omission is pretty simple: even the weakest evolutionary 
criteria in multi-population models tend to reject all strategies which are not strict 
Nash equilibria [8]. Therefore, since our learning model ()2.9p corresponds exactly 
to the multi-population environment ()2.14ll . we lose nothing by concentrating our 
analysis only on the strict equilibria of the game. If anything, this equivalence 
between ESS and strict equilibria in multi-population settings further highlights 
the importance of the latter. 

However, this also brings out the gulf between the single-population setting and 
our own, even when we restrict ourselves to 2-player games (which are the norm in 
single- population models). Indeed, the single-population version of the dynamics 
(ITTD is: 



(7.1) dX^ = [{MX) - u{X,X)') - [r^X^ - Y.p'^f>^. 



dt 



Xa 



As it turns out, if a game possesses an interior ESS and the shocks are mild 
enough, the solution paths X{t) of the (single-population) replicator dynamics will 
be recurrent (theorem 2.1 in [11]). Theorem 16.11 rules out such behaviour in the 
case of strict equilibria (the multi-population analogue of ESS), but does not answer 
the following question: if the underlying game only has mixed equilibria, will the 
solution X{t) of the dynamics (|3.4p be recurrent? 

This question is equivalent to showing that a profile x is stochastically asymptot- 
ically stable in the replicator equations p.4p , (|3.4ip only if it is a strict equilibrium. 
Since theorem 16. II provides the converse "if" part, an answer in the positive would 
yield a strong equivalence between stochastically stable states and strict equilibria; 
we leave this direction to be explored in the future. 

Ito vs. Stratonovich. For comparison purposes (but also for simplicity), let us 
momentarily assume that the noise coefBcients 7]ia do not depend on the state X(t) 
of the game. In that case, it is interesting (and very instructive) to note that the 
SDE (|3.ip remains unchanged if we use Stratonovich integrals instead of Ito ones: 

(7.2) dU.,c.{t) = u.,c.iX{t)) dt + ^W^c,it). 
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Then, after a few calculations, the corresponding replicator equation reads: 

The form of this last equation is remarkably suggestive. First, it highlights 
the role of the modified game Uia = Uia + ^Via sven more crisply than equation 
p.4p : the payoff terms are completely decoupled from the noise, in contrast to 
what one obtains by introducing Stratonovich perturbations in the evolutionary 
setting [12,23]. Secondly, one can seemingly use this simpler equation to get a much 
more transparent proof of proposition |4Tj the estimates for the cross entropy terms 
Gq.-qi, are recoverd almost immediately from the Stratonovich dynamics. However, 
since (|7.3p takes this form only for constant coefficients rjia (the general case is 
quite a bit uglier), we chose the route of consistency and employed Ito integrals 
throughout our paper. 
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